Pitfalls of hypothesis tests and model selection on bootstrap samples: Causes and consequences in biometrical applications.

نویسندگان

  • Silke Janitza
  • Harald Binder
  • Anne-Laure Boulesteix
چکیده

The bootstrap method has become a widely used tool applied in diverse areas where results based on asymptotic theory are scarce. It can be applied, for example, for assessing the variance of a statistic, a quantile of interest or for significance testing by resampling from the null hypothesis. Recently, some approaches have been proposed in the biometrical field where hypothesis testing or model selection is performed on a bootstrap sample as if it were the original sample. P-values computed from bootstrap samples have been used, for example, in the statistics and bioinformatics literature for ranking genes with respect to their differential expression, for estimating the variability of p-values and for model stability investigations. Procedures which make use of bootstrapped information criteria are often applied in model stability investigations and model averaging approaches as well as when estimating the error of model selection procedures which involve tuning parameters. From the literature, however, there is evidence that p-values and model selection criteria evaluated on bootstrap data sets do not represent what would be obtained on the original data or new data drawn from the overall population. We explain the reasons for this and, through the use of a real data set and simulations, we assess the practical impact on procedures relevant to biometrical applications in cases where it has not yet been studied. Moreover, we investigate the behavior of subsampling (i.e., drawing from a data set without replacement) as a potential alternative solution to the bootstrap for these procedures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Categorical variables with many categories are preferentially selected in bootstrap-based model selection procedures for multivariable regression models.

Automated variable selection procedures, such as backward elimination, are commonly employed to perform model selection in the context of multivariable regression. The stability of such procedures can be investigated using a bootstrap-based approach. The idea is to apply the variable selection procedure on a large number of bootstrap samples successively and to examine the obtained models, for ...

متن کامل

On studentising and blocklength selection for the bootstrap on time series.

For independent data, non-parametric bootstrap is realised by resampling the data with replacement. This approach fails for dependent data such as time series. If the data generating process is at least stationary and mixing, the blockwise bootstrap by drawing subsamples or blocks of the data saves the concept. For the blockwise bootstrap a blocklength has to be selected. We propose a method fo...

متن کامل

The Analysis of the Existence of the Hypothesis of Adverse Selection on the Relationship between Off-balance Sheet Items and the Bank's Risk

Balance sheet itself does not specify and show all the activities that a bank pays. Because banks can do many swap contracts and obligations, exchange, and commitments Outside of the balance sheet. To such activities and exchange that will not appear on the balance sheet, are saying off-balance sheet activities. These items are usually reported in the notes to the attached financial statements....

متن کامل

Applications of binary segmentation to the estimation of quantal response curves and spatial intensity.

This paper explores the use of binary segmentation procedures in two applications. The first application is concerned with the estimation of nonparametric quantal response curves. With Bernoulli data and an assumed monotone increasing curve, this gives rise a change-point model where the change points are determined using a sequence of nested hypothesis tests of whether a change point exists. T...

متن کامل

Bootstrap Tilting Conndence Intervals and Hypothesis Tests

Bootstrap tilting conndence intervals could be the method of choice in many applications for reasons of both speed and accuracy. With the right implementation , tilting intervals are 37 times as fast as bootstrap BC-a limits, in terms of the number of bootstrap samples needed for comparable simulation accuracy. Thus 100 bootstrap samples might suuce instead of 3700. Tilting limits have other de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Biometrical journal. Biometrische Zeitschrift

دوره 58 3  شماره 

صفحات  -

تاریخ انتشار 2016